从历史上看,患者数据集已用于开发和验证PET/MRI和PET/CT的各种重建算法。为了使这种算法开发,无需获得数百个患者检查,在本文中,我们展示了一种深度学习技术,可以从丰富的全身MRI中产生合成但逼真的全身宠物纹状体。具体来说,我们使用56 $^{18} $ F-FDG-PET/MRI考试的数据集训练3D残差UNET来预测全身T1加权MRI的生理PET摄取。在训练中,我们实施了平衡的损失函数,以在较大的动态范围内产生逼真的吸收,并沿着层析成像线的响应线对模仿宠物的获取产生计算的损失。预测的PET图像预计会产生合成宠物飞行时间(TOF)正式图,可与供应商提供的PET重建算法一起使用,包括使用基于CT的衰减校正(CTAC)和基于MR的衰减校正(MRAC(MRAC) )。由此产生的合成数据概括了生理学$^{18} $ f-fdg摄取,例如高摄取量位于大脑和膀胱,以及肝脏,肾脏,心脏和肌肉的吸收。为了模拟高摄取的异常,我们还插入合成病变。我们证明,该合成PET数据可以与实际PET数据互换使用,用于比较CT和基于MR的衰减校正方法的PET量化任务,与使用真实数据相比,在平均值中实现了$ \ leq 7.6 \%$误差。这些结果共同表明,所提出的合成PET数据管道可以合理地用于开发,评估和验证PET/MRI重建方法。
translated by 谷歌翻译
Agents that can follow language instructions are expected to be useful in a variety of situations such as navigation. However, training neural network-based agents requires numerous paired trajectories and languages. This paper proposes using multimodal generative models for semi-supervised learning in the instruction following tasks. The models learn a shared representation of the paired data, and enable semi-supervised learning by reconstructing unpaired data through the representation. Key challenges in applying the models to sequence-to-sequence tasks including instruction following are learning a shared representation of variable-length mulitimodal data and incorporating attention mechanisms. To address the problems, this paper proposes a novel network architecture to absorb the difference in the sequence lengths of the multimodal data. In addition, to further improve the performance, this paper shows how to incorporate the generative model-based approach with an existing semi-supervised method called a speaker-follower model, and proposes a regularization term that improves inference using unpaired trajectories. Experiments on BabyAI and Room-to-Room (R2R) environments show that the proposed method improves the performance of instruction following by leveraging unpaired data, and improves the performance of the speaker-follower model by 2\% to 4\% in R2R.
translated by 谷歌翻译
We present a lightweight post-processing method to refine the semantic segmentation results of point cloud sequences. Most existing methods usually segment frame by frame and encounter the inherent ambiguity of the problem: based on a measurement in a single frame, labels are sometimes difficult to predict even for humans. To remedy this problem, we propose to explicitly train a network to refine these results predicted by an existing segmentation method. The network, which we call the P2Net, learns the consistency constraints between coincident points from consecutive frames after registration. We evaluate the proposed post-processing method both qualitatively and quantitatively on the SemanticKITTI dataset that consists of real outdoor scenes. The effectiveness of the proposed method is validated by comparing the results predicted by two representative networks with and without the refinement by the post-processing network. Specifically, qualitative visualization validates the key idea that labels of the points that are difficult to predict can be corrected with P2Net. Quantitatively, overall mIoU is improved from 10.5% to 11.7% for PointNet [1] and from 10.8% to 15.9% for PointNet++ [2].
translated by 谷歌翻译
This paper presents a portrait stylization method designed for real-time mobile applications with limited style examples available. Previous learning based stylization methods suffer from the geometric and semantic gaps between portrait domain and style domain, which obstacles the style information to be correctly transferred to the portrait images, leading to poor stylization quality. Based on the geometric prior of human facial attributions, we propose to utilize geometric alignment to tackle this issue. Firstly, we apply Thin-Plate-Spline (TPS) on feature maps in the generator network and also directly to style images in pixel space, generating aligned portrait-style image pairs with identical landmarks, which closes the geometric gaps between two domains. Secondly, adversarial learning maps the textures and colors of portrait images to the style domain. Finally, geometric aware cycle consistency preserves the content and identity information unchanged, and deformation invariant constraint suppresses artifacts and distortions. Qualitative and quantitative comparison validate our method outperforms existing methods, and experiments proof our method could be trained with limited style examples (100 or less) in real-time (more than 40 FPS) on mobile devices. Ablation study demonstrates the effectiveness of each component in the framework.
translated by 谷歌翻译
While natural systems often present collective intelligence that allows them to self-organize and adapt to changes, the equivalent is missing in most artificial systems. We explore the possibility of such a system in the context of cooperative object manipulation using mobile robots. Although conventional works demonstrate potential solutions for the problem in restricted settings, they have computational and learning difficulties. More importantly, these systems do not possess the ability to adapt when facing environmental changes. In this work, we show that by distilling a planner derived from a gradient-based soft-body physics simulator into an attention-based neural network, our multi-robot manipulation system can achieve better performance than baselines. In addition, our system also generalizes to unseen configurations during training and is able to adapt toward task completions when external turbulence and environmental changes are applied.
translated by 谷歌翻译
马尔可夫链蒙特卡洛(MCMC),例如langevin Dynamics,有效地近似顽固的分布。但是,由于昂贵的数据采样迭代和缓慢的收敛性,它的用法在深层可变模型的背景下受到限制。本文提出了摊销的langevin Dynamics(ALD),其中数据划分的MCMC迭代完全被编码器的更新替换为将观测值映射到潜在变量中。这种摊销可实现有效的后验采样,而无需数据迭代。尽管具有效率,但我们证明ALD是MCMC算法有效的,其马尔可夫链在轻度假设下将目标后部作为固定分布。基于ALD,我们还提出了一个名为Langevin AutoCodeer(LAE)的新的深层变量模型。有趣的是,可以通过稍微修改传统自动编码器来实现LAE。使用多个合成数据集,我们首先验证ALD可以从目标后代正确获取样品。我们还在图像生成任务上评估了LAE,并证明我们的LAE可以根据变异推断(例如变异自动编码器)和其他基于MCMC的方法在测试可能性方面胜过现有的方法。
translated by 谷歌翻译
人级AI将对人类社会产生重大影响。但是,实现时间的估计值应有争议。为了到达人工通用情报(AGI)的人工AI,而不是专门从事特定任务的AI系统,是技术意义上有意义的长期目标。但是现在,由于深度学习的进步,这一成就越来越近了。考虑到最近的技术发展,通过“综合技术地图方法”讨论人级AI的完成日期是有意义的,其中我们以合理的粒度绘制人类水平的能力,确定当前的技术范围,并讨论并讨论人类水平的能力。穿越未开发领域的技术挑战,并预测何时将克服它们。本文提出了一种新的论证选择来查看本体论六重奏,该选项涵盖了实体,该实体与我们的日常直觉和科学实践一致,作为全面的技术图。因为关于如何解释世界的大多数建模,因此智能主题是对远端实体的认可以及对它们的时间进化的预测,能够处理所有远端实体是一个合理的目标。根据哲学和工程认知技术的发现,我们预测,在相对较远的将来,AI将能够与人类相同的程度识别各种实体。
translated by 谷歌翻译
来自重力波检测器的数据中出现的瞬态噪声通常会引起问题,例如检测器的不稳定性以及重叠或模仿重力波信号。由于瞬态噪声被认为与环境和工具相关联,因此其分类将有助于理解其起源并改善探测器的性能。在先前的研究中,提出了用于使用时频2D图像(频谱图)进行瞬态噪声进行分类的体系结构,该架构使用了无监督的深度学习与变异自动编码器和不变信息集群的结合。提出的无监督学习结构应用于重力间谍数据集,该数据集由高级激光干涉仪重力波动台(Advanced Ligo)瞬态噪声与其相关元数据进行讨论,以讨论在线或离线数据分析的潜力。在这项研究的重点是重力间谍数据集中,研究并报告了先前研究的无监督学习结构的训练过程。
translated by 谷歌翻译
使用移动操纵器来整理家庭环境,在机器人技术中提出了各种挑战,例如适应大型现实世界的环境变化,以及在人类面前的安全和强大的部署。2021年9月举行的全球竞赛,对真正的家庭环境中的整理任务进行了基准测试,重要的是,对全面的系统性能进行了测试。对于此挑战,我们开发了整个家庭服务机器人系统,该机器人系统利用数据驱动的方法来适应众多的方法在执行过程中发生的边缘案例,而不是经典的手动预编程解决方案。在本文中,我们描述了提出的机器人系统的核心成分,包括视觉识别,对象操纵和运动计划。我们的机器人系统赢得了二等奖,验证了数据驱动的机器人系统在家庭环境中移动操作的有效性和潜力。
translated by 谷歌翻译
与最小化点对点距离的传统算法设置的注册最小化通常可以更好地估计刚性转换。然而,最近的基于深度学习的方法最大程度地减少了点对点距离。与这些方法相反,本文提出了第一种基于深度学习的方法来点对上注册的方法。该问题的一个具有挑战性的部分是,用于点对点注册的典型解决方案需要迭代的过程来累积通过最小化线性的能量函数获得的小型转换。迭代显着增加了反向传播所需的计算图的大小,并且可以放慢前进和后退网络评估。为了解决此问题,我们将估计的刚体转换视为输入点云的函数,并使用隐式函数定理得出其分析梯度。我们引入的分析梯度独立于如何获得误差最小化函数(即刚性变换),从而使我们能够有效地计算刚性变换及其梯度。我们在几种先前的方法上实现了所提出的点对平面注册模块,这些模块可以最大程度地减少点对点距离,并证明扩展名的表现超过了基本方法,即使具有噪声和低质量的点云的点云,也通过局部点分布估算了差异。
translated by 谷歌翻译